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Abstract 

Individuals’ access to information in a social network de¬ 
pends on its distributed and where in the network individ¬ 
uals position themselves. However, individuals have limited 
capacity to manage their social connections and process in¬ 
formation. In this work, we study how this limited capacity 
and network structure interact to affect the diversity of in¬ 
formation social media users receive. Previous studies of the 
role of networks in information access were limited in their 
ability to measure the diversity of information. We address 
this problem by learning the topics of interest to social me¬ 
dia users by observing messages they share online with their 
followers. We present a probabilistic model that incorporates 
human cognitive constraints in a generative model of infor¬ 
mation sharing. We then use the topics learned by the model 
to measure the diversity of information users receive from 
their social media contacts. We confirm that users in struc¬ 
turally diverse network positions, which bridge otherwise dis¬ 
connected regions of the follower graph, are exposed to more 
diverse information. In addition, we identify user effort as an 
important variable that mediates access to diverse informa¬ 
tion in social media. Users who invest more effort into their 
activity on the site not only place themselves in more struc¬ 
turally diverse positions within the network than the less en¬ 
gaged users, but they also receive more diverse information 
when located in similar network positions. These findings in¬ 
dicate that the relationship between network structure and ac¬ 
cess to information in networks is more nuanced than previ¬ 
ously thought. 


Introduction 


People use their social contacts to gain access to infor¬ 
mation in social networks (Gran ovetter 1973] [Burt 2004), 
which they can then leverage for personal advantage. How¬ 
ever information in social networks is non-uniformly dis¬ 
tributed, leading sociologists to explore the relationship be¬ 
tween an individual’s network position and the novelty and 
diversity of information she receives through her social con¬ 
tacts. Studies of social and organizational networks identi¬ 
fied the importance of so-called brokerage positions, which 
link individuals to otherwise unconnected people (|Granovet 


ter 1973 Burt 1995] Burt 2005] Aral and Van Alstyne 201 1[ >. 
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By spanning distinct communities, brokerage positions ex¬ 
pose individuals to novel and diverse information, which 
leads to new job prospect s (|Granovetter 1973 1 and higher 
compensation (Burt 1995 Burt 2004| l. However, the links 
that connect individuals in brokerage positions to the rest of 
the network, generally represent weaker relationships (i.e., 
acquaintances rather than close friends) ( Granovetter 1973] 
|Onnela et al. 2007] >. The less frequent interactions along 
these “weak” links limit the amount of information flowing 
to individuals (Aral and Van Alstyne 2011). Thus, those who 
are able, and willing, to invest greater effort in social interac¬ 
tions, will manage more connections thereby increasing the 


volume of information they receive through those links (Aral 
[and David 2012[|Miritello et al. 2013~bj>. Specifically, Aral & 
Van Alstyne (Aral and Van Alstyne 2011) showed that indi¬ 
viduals can increase the diversity and novelty of information 
they receive via email either by placing themselves in bro¬ 
kerage positions, or by communicating more frequently with 
their social contacts. 

In contrast to email and phone interactions, where infor¬ 
mation is exchanged between a pair of social contacts, social 
media users broadcast information to all their contacts. Bak- 
shy et al. ( jBakshy et al. 2012) showed that weak links col¬ 
lectively deliver more novel information to Facebook users, 
even though they interact infrequently with these contacts. 
These findings suggest that an easy way for social media 
users to increase their access to diverse information is by 
creating more links, e.g., by following other users. How¬ 
ever, cognitive (and temporal) constraints limit an individ¬ 
ual’s capacity to manage social interaction s (jDunbar 1992} 
Goncalves, Perra, and Vespignan i 2011 [ Miritello et al. 
201 3b|) a nd process the information they receive ( |Weng et 
al. 2012||Hodas and Lerman 2012). In addition, social me 


dia users vary greatly in the effort they expend engaging with 
the site, leading to a large variation in user activity, as mea¬ 
sured by the number of messages posted on the site (Wilkin¬ 
son 2008). The impact of this variation on the information 


individuals receive and their position in the network is not 
known. Do users who are able (or at least willing) to be more 
active on the site receive more diverse information? Do they 
curate their social links so as to move themselves into net¬ 
work positions that provide more diverse information? 

In this work, we use data from the microblogging site 
Twitter to study the interplay between network structure. 




















































the effort Twitter users are willing to invest in engaging 
with the site, and the diversity of information they receive 
from their contacts. Previous studies of the role of networks 
in individual’s access to information were limited in their 
ability to measure the diversity of information, using bag- 
of-words (Aral and Van Alstyne 20111 or predefined cate¬ 
gories ( Kang and Lerman 2013b} ) for this task. In this work, 
we learn topics of interest to social media users from the 
messages they share with their followers. We present a prob¬ 
abilistic topic model that incorporates human cognitive con¬ 
straints in a generative model of information sharing and 
evaluate the model on the task of predicting the messages 
users retweet. We demonstrate that our model has compet¬ 
itive performance, and unlike other models, it produces de¬ 
scriptions of topics. 

We use learned topics to measure the diversity of infor¬ 
mation users receive from their contacts. This enables us 
to study the factors that affect the diversity of informa¬ 
tion in networks. Our findings indicate that the relation¬ 
ship between network structure and access to information 
is more nuanced than previously thought. First, users cannot 
increase the diversity of the information they receive by in¬ 
creasing the number of their contacts. Second, we confirm 
that users in structurally diverse network positions, which 
bridge otherwise disconnected regions of the follower graph, 
are exposed to more diverse topics via their contacts than 
users in less structurally diverse positions. However, we 
demonstrate that user effort is an important variable medi¬ 
ating access to information in networks. Active users who 
post more messages on Twitter receive more diverse infor¬ 
mation even when they are in structurally similar positions to 
the less active users. This suggests that users who are willing 
(or able) to engage more on Twitter curate their contacts so 
as to increase the diversity of the information they receive. 
Since effort is a useful proxy for individual’s cognitive ca¬ 
pacity for (or at least the willingness to invest the time in) 
processing information in social networks (Miritello et al. 
2013a), our work suggests that cognitive factors interact in 
non-trivial ways with network structure to define access to 
information in social networks. 


Description of Data 

Twitter is an online social networking and microblogging 
service that allows users to follow the activity of others to 
see the messages they posted or retweeted recently. When a 
user posts or retweets a message, it is broadcast to all her 
followers, who are then able to see it in their own streams. 
Twitter offers an Application Programming Interface (API) 
for data collection. We used two data sets collected in the 


past from Twitter. The 2012 data set (Kang and Lerman 
2015) contains tweets including a URL to monitor informa¬ 


tion spread over the social network from Nov 2011 to Jul 
2012. They start by monitoring potential seed URLs contain¬ 
ing http://t.co from the streaming APIs and collect all tweets 
containing them. Since the total volume of tweets contain¬ 
ing a URL is very large, they focus on broadly shared URLs. 
They selected as seeds the URLs that appeared more than 
once in five days from its initial appearance in the streaming 
APIs based on the heuristic that the URLs that have been ap¬ 


peared more often in the streaming APIs will be more pop¬ 
ular on Twitter. They collected the entire history of these 
seed URLs until there were no more tweets containing them 
within five days from their last appearance in the Twitter 
REST APIs. This yielded 12.5M tweets with 9.5M users. 

The 2014 data set contains the tweets from 5600 initial 
seed users ( |Smith et al. 2013] ) and their friends from Mar 
2014 to Oct 2014. Starting with 5,600 initial seed users, they 
collected all their friends and at least first 200 tweets from 
their time line. The data set includes 23.8 M tweets from 
1.9M users with 17.8M social network links. 

Probabilistic Model of User Topics 

We use a probabilistic model to learn users’ topics of in¬ 
terest from the messages they share in social media. What 
information users share, and which messages shared by 
friends they decide to spread to their followers, depends on 
a number of factors, such as virality of information being 
shared, users’ tastes, and their followers’ tastes. To under¬ 
stand information sharing in social networks, social recom¬ 
mendation models (|Ma et al. 2008 [ |Wang and Blei 2011 1 
Kang, Lerman, and Getoor 2013] ) were used to represent 
users’ interests and items they share by /.-dimensional topic 
vectors. Once these hidden topic vectors are learned from 
user’s item adoption (i.e., retweeting) history, it is possible 
to calculate the personal relevance of a new item to the user. 

We proposed V IP ( |Kang and Lerman 2015[ ), a model that 
captures the three basic ingredients of information spread in 
social media: item’s visibility (v) to a user, its fitness or vi¬ 
rality ( 77 ), and its (personal) relevance ( S ) to the user. While 
the model improves on previous models, it applies normal 
distribution assumptions on modeling binary responses, uses 
full user-item adoption matrix, and provides no descriptions 
on the learned latent topic space. In this paper, we model bi¬ 
nary responses (adopted vs unadopted items) of social media 
users with multinomial logic model. Stochastic optimiza¬ 
tion allows us to learn from randomly sampled negative (not 
adopted) and positive (adopted) dyads without overfitting to 
the positive ones. Our stochastic inference algorithm han¬ 
dles many user-item dyads and can be distributed for effi¬ 
cient computation. Furthermore, with the help of a proba¬ 
bilistic topic model, we can provide an interpretable low¬ 
dimensional representation of information. Figure [I] graphi¬ 
cally represents our model. 


Item visibility When a user’s message stream is delivered 
as a list of items, the process of item discovery is biased by 
the position of each item in the list. A user is more likely 
to see items near the top of the list than those deeper in the 
stream (Lerman and Hogg 2014). Hence, items in top stream 
positions have higher visibility. Since we do not know an 
item’s exact position, we estimate it as the average visibility 
of items to user i as follows: 


Vi~J2 (G(l/(1 + Pi),m - IG(/x, A, L))) (1) 

L 

The first factor gives the probability that user i discovers an 
item depending on the number of items in her stream. The 


























which analyzes the co-occurrence of the words in docu¬ 
ments, to learn the hidden topics representing the docu¬ 
ments. In our case, LDA captures the item’s topic distri¬ 
bution <j> , which is represented as K dimensional vector in 
the recommendation model. The topic distribution of each 
document (<p,i :i ) is viewed as a mixture of multiple topics, 
with each topic (/3k) as a distribution over words. In our set¬ 
ting, the corpus I? is a collection of tweet text of the tweet 
posts. The likelihood of D is computed by multiplying over 
all documents and all words in each document as follows: 


Figure 1: Our model with user topic ( u ) and item topic ( 0 ) 
profiles, item’s personal relevance (6) and visibility to user 
(v), item fitness ( 77 ), expected number of new posts user re¬ 
ceived ( p ) and item adoption (r). Topic model part has the 
topic distribution (< f >) of an item and a distribution///) over 
words from a vocabulary of size M. N is the number of 
users, and D is the number of items. 


greater the number of new messages user receives between 
visits to the site, the less likely the user is to view any spe¬ 
cific item. Thus, average visibility depends on the frequency 
the user visits the site and the rate of posts received. This 
competition between the rates friends post new messages to 
the user’s stream and the rate user visits the stream to read 
the messages modeled by a geometric distribution with suc¬ 
cess probability p = 1/(1 + pi): G = (1 — p) L p. The ratio 
Pi of these rates gives the expected number of new messages 
in a user’s stream. The second factor of gives the probability 
that user i will navigate to at least ( L -\- l)-th position in the 
stream to view the item. This is estimated by the upper cu¬ 
mulative distribution of an inverse gaussian IG with mean 
p and shape parameter A and variance p 3 /X: 


exp 


/-A(L-m) 2 \ 

' A ' 

V J 

_2 ttL 3 _ 


1 ( 1 / 2 ) 


( 2 ) 


Item virality Social media users adopt items even if they 
had not earlier demonstrated a sustained interest in their top¬ 
ics. This is often the case with viral, general-interest items, 
such as breaking news or celebrity gossip. Thus, we use “vi¬ 
rality” to represent item’s propensity to spread on exposure. 

Vj ~ N (0, cr 2 ) (3) 

Item relevance We calculate personal relevance of an item 
j to user i as: 

6ij~gs(uj9j) (4) 

where symbol T refers to the transpose operation, y, repre¬ 
sents the topic profile of user i, 0j represents the topic profile 
of item j and gs is linear function for simplicity. 

Ui ~ J\f(0, all K ) 

Oj ~ A/(0, agI K ) 

where K is the number of topics. 

We use a widely known text mining algorithm Latent 
Dirichlet Allocation (LDA) ( |Blei, Ng, and Jordan 2003) , 


p(D\/3^,z) = <j)d j ,z w ^z w ,w 

dj£D w£dj 


(6) 


where z w is assigned topic index for each word w in the 
document dj, is the likelihood of topics z w for the 

document dj and $ Zw , w is the likelihood of choosing specific 
word w for the topic z w . 

The generative process for item adoption through a social 
stream can be formalized as follows: 


For each user i 

Generate Ui ~ A/(0, er 2 /#) 

Generate Vi ~ El (G^A 1 + P*)> l)(l-IG(p,\,l))) 
For each item j 

Generate r/ ? ~ 7V(0, cr 2 ) 

Generate cj)j ~ Dirichletia ) 

Generate ej ~ Af( 0, ct^Ir ) and set Oj = €j + (f>j 
For each word Wj m 

Generate topic assignment Zj rn ~ Mult((f>j) 
Generate word Wj m ~ Mult(/3 Zjm ) 

For each user i 

For each item j on the news feed 

Generate the adoption r,; ? ~ p(/(r,; 7 -)|u,, v, 0, g 1 Oi) 


Lack of adoption by user i of item j (rij = 0) can be inter¬ 
preted in two ways: either the user saw the item but did not 
like it, or the user did not see the item but may have liked 
it had she seen it. While other models partly account for 
the lack of knowledge about non-adoptions using smooth¬ 
ing ( jWang and Blei 2011 ; |Kang and Lerman 2013aj ), we 
properly model visibility of items to users. 

We model the user-item adoption with Softmax function, 
which makes the values of the K dimensional vectors in [0- 
1] range. The equation is as follows: 


p(I(rij)\u i ,v 1 0 1 r],O l ) 


exp (vig r (Sij +r)j )) 

E ieO ex P ( v i9r{8u +m)) 

(7) 


where /) is the indicator function, I(i\j ) = 1 when user i 
adopted item j and 0 otherwise, and (), is the observed items 
by user i. We define g r as linear functions for simplicity. 



























The main objective function is: 
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The last term of the equation minimizes the error between 
the binary rating and the predicted rating. The second line of 
the equation minimizes the error between the topics that ex¬ 
plain the recommendation and the content. The importance 
between these two components can be controlled with erg. 
MAP estimation is equivalent to maximizing the complete 
log likelihood (£) of U, V, 9, rp (j) and r given a u , <Je, cr n , \i, 
A and p. 


Model Learning 

To optimize Eq. ([ 8 ]), we develop a stochastic gradient de¬ 
scent algorithm. Given a current estimate, we take the gra¬ 
dient of Eq. ([ 8 ]» with respect to it,;, 9 V and rjj and iteratively 
optimize the parameters {ui,0j,r)j}. Derived update equa¬ 
tions are: 


Algorithm 1 Stochastic Optimization 
Initialize model parameter U, V, 9 , rj, (j), V 
for t = 1 to T do 

for a in U do 

Choose random |ry| mini batch Si from D-ri 
Generate Oi = Vi U Si 

for j in Oi do 

Ui j Ui /i [VjOjV + 'Ui] 

Oj - P b»«tV + 2 | r ^| cr 2 (Oj - <f>j)\ 

Vj <- Vj - P [v<V + 2 i 4 r ^Vj] 

end for 
end for 
end for 


where ry is the number of items adopted by user i and | r.j \ 
is the number of users who adopted item j. We generate a 
set of observed items O, by adding randomly sampled r, 
number of items from the unadopted set ( D-ri ) and incre¬ 
mentally learning from the unadopted and adopted item set 
of each user. We use the learning rate p with discount by 
a factor of 0.9 in each iteration (Koren, Bell, and Volinsky 
[2009 ). 

The equation for gradient (V) is as follows: 

_ exp(vig r (8jj + rjj)) _ , , 

Eieo^Mvi9r(Su+m)) W 


Table 1: Model parameters used in this study. 


Parameters 

Value 

number ot topics 

K =100 

user topic profile 


item topic profile 

4=10 

item fitness 

a n -=10 

law of surfing 

p = 14.0 

A = 14.0 

vrews per post 

38 

typical posting rates 

E4 


The proposed recommendation model can be updated incre¬ 
mentally to model dynamic user adoptions in real time. It is 
also computationally efficient since it can be distributed by 
decomposing the data set over multiple computers. 


Model Selection 

We use the same “law of surfing” parameters, p = 
14.0 and A = 14.0, as (Kang and Lerman 2015 
Hogg, Lerman, and Smith 2013; Hogg and Lerman 2012 
did in their study of social media. The expected num 
ber of new posts including a URL user i received, p t , 
is computed by rate\ /rate] . The 

rate rate j is proportional to the number of 

friends ( Nf r d(i )) * follows and their average posting fre¬ 
quency. To estimate posting frequency of all users, we 
use the typical URL posting rates of users from our data: 
rate\ posts received '> = lA*Nf rd ^y We estimate user i’s vis¬ 
iting rate (rate) vlslts ^) using the number of posts of user i 


( N p osts(i ))• (Hogg, Lerman, and Smith 20131 estimated that 
average number of visits per post waiT38 (2014 data set) for 
Twitter users. Also, since around 20% of tweets include a 
URL ( jChaudhry et al. 2012] l, the posting rate of user i be¬ 
comes rate\ vlsits ^ = 7.6 * N pos t s (i) (2012 data set). 

Lor the model hyper-parameters, we vary the parameters 
K e{10, 30, 50, 100, 200}, and {A u , A e } €{1CT 4 , 1(T 3 ,..., 
10 4 } by using grid search on validation set. Throughout this 
paper, we set parameters K = 100, A u = 0.01, A g = 0.001, 
both for PML and CTL that performed the best for PML. Lor 
the fitness parameter of VIP ( [Kang and Lerman 2015 1 and 
the proposed model, we vary <j p € { 10 -4 , lCV 3 ,..., 10 4 }, 
while we fix other parameters: a'g = 10 4 and u‘~ = 10 4 . In 
this paper, we set crl = 10 . 


Model Evaluation 

We evaluate the proposed model by using it to predict which 
items users will adopt. Lor this task, user i’s adoption of 
item j shared by a friend is obtained by point estimation 
with optimal variables {9*, u*,v*, p*}: 

nnj\V} «E[ Wi |2?] r (E[5y|D] +E[ Vj \V]) 

T V ^ ^7 

r ij ~Vi [Ui 9j + rig ) 

where V is the training data. The adoption probability is de¬ 
cided by user visibility v*, user topic profile u*, item topic 
profile 9*, and item fitness 77 *. 

To evaluate the performance, we use precision (P), recall 
(R) and normalized discounted cumulative gain (nDCG) for 
top-x recommended posts. 

































Table 2: Overall prediction performance comparison us¬ 
ing Precision @x (P@x), Recall@x (R@x), normalized 


DCG@x (nDCG@x) on Twitter dataset. 


Model 

Text 

P@10 

R@10 

nDCG@10 

Random 

No 

0.0483 

0.3738 

0.2410 

Fitness 

No 

0.0798 

0.5924 

0.3630 

Relevance 

No 

0.0647 

0.4383 

0.3170 

Vip 

No 

0.0984 

0.6446 

0.4205 

Softmax-CIK 

Yes 

0.1047 

()'6105 

0.4123 

Our Model 

Yes 

0.1138 

0.7022 

0.4619 


P@x computes the fraction of items that are adopted by 
each user in top-x items in the list. We average the 
precision @x of all users. 

R@x computes the fraction of adopted items that are suc¬ 
cessfully discovered in top-x ranked list out of all adopted 
items by each user. We average the recall @x of all users. 

nDCG@x computes the weighted score of adopted items 
based on the position in the top-x list. It penalizes adopted 
items in the bottom of the top-x list. We average the 
nDCG@ x of all users. 

We divide each user’s adopted items into five folds and 
construct the training set and the test set. We use five-fold 
cross validation and compare performance of the proposed 
model to five baseline models: Random, Fitness, Rele¬ 
vance, Vip, CTR. The Random baseline chooses items 
at random from among the items in user i’s stream, i.e., 
items adopted by i’s friends. The baseline FITNESS uses 
item fitness values ( 77 ) learned by V IP to recommend k high¬ 
est fitness items. The baseline RELEVANCE bases its rec¬ 
ommendations on user-topic and item-topic vectors learned 
by PMF. Collaborative Topic Regression (CTR) ( |Wang and] 
Blei 201 T) was originally introduced to recommend sci¬ 
entific articles. It combines collaborative filtering (PMF) 
and probabilistic topic modeling (LDA). It captures two K- 
dimensional lower-rank user and item hidden variables from 
user-item adoption matrix and the content of the items. This 
model uses textual information and negative dyads, but un¬ 
like our method it uses f 2 function instead of a Softmax. 
Here for a fair comparison, we implemented a Softmax ver¬ 
sion. Based on our experiment Softmax-CTR outperformed 
original CTR due to the binary adoptions of social media. 

Table [2] shows the models’ overall performance on the 
user-item adoption prediction task. In this paper, we set 
,r= 10 since recommending too many items is not realistic. 
From our experiments, we found that results are consistent 
with different number of k. While nDCG@x uses the posi¬ 
tion of correct answer in the top-x ranked list, it does not 
penalize for unadopted items or missing adopted items in 
the top-x ranked list, therefore one has to consider the per¬ 
formance of all three metrics together. Intuitively a better 
model should have higher P@x, R@x, and nDCG@x. 

The experimental results show that the proposed model 
dramatically outperforms the random model with 135.61% 
and 87.85% respectively on precision and on recall. A com¬ 
parison against the random model is important to uncover 
the complexity of the post-recommendation task. FITNESS 
and Relevance models yield 62.21% and 33.95% im¬ 
provement over the random model in terms of precision, 


Var. 

Description 

A f D t 

Oi 

Ui 

FTDi 

number of active friends 

network diversity 

avg. vol. of outgoing info. (# tweets/day) 
user-topic vector, (/c-dimensional vector) 
friend topic diversity 


Table 3: Variables used in the study. 

and 58.48% and 17.25% in terms of recall respectively. The 
gain of VlP over RELEVANCE is 52.08% on precision and 
47.06% on recall, while the one of CTR over RELEVANCE 
is 61.82% on precision and 39.28% on recall. This shows 
that accounting for cognitive biases dramatically improves 
predictability of user item adoptions in social media as much 
as accounting for text description of items alone. Among all 
models, the proposed model yields best performance, show¬ 
ing that modeling text, as well as visibility, is critical in so¬ 
cial media recommendation. 


Information Access in Networks 

We use the topics learned by the proposed model to study 
how information is distributed in a network and what users 
can do to increase the diversity of information they receive 
from their social media friends. In order to use the mes¬ 
sages users posted, in addition to friends’ messages they 
retweeted, we changed the model by assigning visibility 
equal to one to each original message user posted. 


Definition of Variables 


Following ( |Aral and Van Alstyne 2011 Aral and David 
|2012| i we define a set of variables we use to characterize 
users, their network position, and information diversity. 


Network size We define the network size S) of user i as 
the number of friends from whom user i received messages 
during a time period At, which we take to be the data col¬ 
lection period. We only consider active friends, i.e., friends 
who posted messages during At. Network size is defined as 

s *= Y (id 

l&Nf rd 

where Nf rd is the set of friends of user i and the indicator 
function t(r;) is one if and only if friend l tweeted during 
the time period At and zero otherwise. 


Network diversity User’s position in a network signifi¬ 
cantly impacts the diversity of received information. Posi¬ 
tion can be characterized by its structural diversity, which 
represents how many otherwise unconnected contacts user i 
has. We measure structural diversity of a network position 
using local clustering coefficient (Watts and Strogatz 1998]!, 
Ci, which quantifies how often user i’s contacts are linked 
(regardless of the direction of the link): 


C, = 


2 x \{e jk : j,k£ N s i rd ,e :jk € E}\ 

Si(Si- 1 ) 


( 12 ) 






































The variable = 1 if user j follows user k or vice versa; 
otherwise, ejk = 0. The total number of possible connec¬ 
tions among contacts is S',; (.S', — 1). High clustering coeffi¬ 
cient implies low network diversity, and vice versa. There¬ 
fore, we define network diversity of user i as N T), = 1 — C,. 
Note that brokerage positions have high network diversity, 
while individuals in tightly-knit communities have are in po¬ 
sitions with low network diversity. 


User effort Most social media sites, including Twitter, dis¬ 
play items from friends as a chronologically ordered list, 
with the newest items at the top. A user scans the list and 
if she finds an item interesting, she may share it with her 
followers by retweeting it. She will continue scanning the 
list until she loses interest or distracted ( jHodas and Lerman] 
2012). It is difficult to quantify how much of the list a user 
processes, since the site does not provide this information. 
Instead, we use user activity as a heuristic for the effort users 
are willing (or able) to invest in Twitter. We measure user i 's 
activity by the average number of messages the user tweets 
and retweets per day: 


O = — 
1 At 


(13) 


where |r, | is the number of tweets from user i. 


Table 4: Keywords associated with the top 10 topics of users 
in different positions within the network. Users are divided 
into two populations based on their network diversity ( ND). 


# 

Users in a Low ND 

Users in a High ND 

1 

lesson weight loss acoustic 

lose motive guitar flash gain 

profession connect profile 

webdesign bigdata update 

2 

pet dog animal adopt praise 

cat rescue love mate relax 

children parent surgery inch 

anxiety obesity autism 


read book review kindle 
novel cover publish buddha 

united kingdom stadium 
arena holland yankees 

~T“ 

good happy hope morn 
birthday wish love like 

prosecute labour governor 
Palestinian nationwide peru 

5 

yoga workout exercise jump 

doctor fit body back diet 

ferguson pray brooklyn 

documentary Oakland 

6 

graphic japanese poetry 
manga cinema photo 

art center science exhibit 
culture paper draw museum 

7 

oil kale gene napa sausage 

wrap aspire coal trainer 

camera shoot timeline canon 

len accent timeline possess 

8 

children parent common 

journey ready pack escape 

worldcup shout football 

soccer illinois player sold 

9 

home design studio site 
interior built lawn layout 

space mars nasa planner 
newton isaac modern 

10 

beauty summer city park 

resort nation beach island 

tree win get email gift 

chance enter offer ticket 


of low network diversity focused on more specialized topics, 
such as hobbies (“guitar”, “book”, “yoga”, “manga”), pets 
(“dog”, “cat”), family (“birthday”, “children”), food (“oil”, 
“kale”), vacation (“journey”, “escape”,“island”), home & 
garden (“home”, “interior”). 


Friend topic diversity We measure the diversity of in¬ 
formation user i receives from friends by the the variance 
of friends’ topic interests: when most of friends have dis¬ 
tinct, non-overlapping, interests, topic diversity will be high, 
whereas when most of friends have similar topic interests it 
will be low. We define friend topic diversity as the average 
pair-wise cosine distance of friends’ topic interest vectors. 


FTDi 


2 x Cos(Uj,Uk)) 


Si(Si - 1) 


(14) 


Information and Network Structure 

Information is not uniformly distributed in a network: users 
in brokerage positions are interested in systematically differ¬ 
ent topics than users within denser communities. To study 
user-topic distribution, we rank users according to network 
diversity (ND) and split them into two equal sized groups: 
high and low network diversity. Table [4] compares the rep¬ 
resentative keywords of the top ten topics from the topic 
profiles of users in these two groups. Users in high net¬ 
work diversity positions tend to be interested in more gen¬ 
eral topics, such as sports (“worldcup”, “yankees”, “lad”), 
current events (“ferguson”, “Oakland”), business (“profes¬ 
sion”, “big data”), health (“surgery”, “obesity”), politics 
(“peru”, “Palestinian”), arts (“art”, “exhibit”, “camera”), sci¬ 
ence (“science”, “nasa”, “space”), promotion (“gift”, “of¬ 
fer”), etc. According to sociological theory, users in such 
brokerage positions spanning multiple unconnected commu¬ 
nities are exposed to diverse information ( Burt 1993) ; there¬ 
fore, it makes sense that the topics they have in common are 
the more general topics. On the other hand, users in positions 


Increasing Exposure to Diverse Information 

How can users increase the amount of diverse information 
they receive in social media? Do they follow more people 
to increase the volume of information received? Or do they 
move themselves into special network positions? To exam¬ 
ine how user effort affects information access, we split users 
into four classes based on the average number of tweets they 
post daily (O). The top quartile contains the most active 
users, who post more than 5.3 tweets per day, the second 
quartile contains users who post from 3.1 to 5.3 tweets per 
day and the third and the bottom quartile contains from 1.9 
to 3.1 and fewer than 1.9 tweets per day respectively. 

Figure [2] shows the relationship between diversity of 
received information, measured by friend topic diversity 
( FTD ), and user’s network size ( S ), for these classes of 
Twitter users. The trends among these four classes of users 
are somewhat different, indicating that people use differ¬ 
ent strategies to access information in network. Active users 
who expend more effort on Twitter (red circles in Figure [2]) 
increase their exposure to diverse information by adding 
more friends (0.1874, p<.01). However, when the bottom 
quartile users (blue squares in Figure [2]) add friends, this 
actually decreases the diversity of information they are ex¬ 
posed to until around 100 friends. After that point, informa¬ 
tion diversity slowly increases. For the same network size, 
the less active users actually receive more diverse informa¬ 
tion than the more active user until around 100 friends. Ap¬ 
parently, network size itself cannot provide an access to di¬ 
verse information (when S > 100) since the network struc¬ 
ture can vary significantly. 

In addition to network size, network position is known to 
play an important role in determining access to information. 























S (number of friends) 


Figure 2: Diversity of received information as a function of 
user’s network size. Users are divided into four populations 
based on their effort: red circles represent the more active 
users, (who post more than 5.3 tweets per day on average), 
green stars represent the 2nd quartile (3.1< 0,<5.3), black 
triangles represent 3rd quartile (1.9< 0, <3.1) and the blue 
squares represent that bottom quartile users (who post fewer 
than 1.9 tweets per day on average). We discretize values 
into equal-sized bins for each quartile. 


In social and email communication networks, people in high 
network diversity positions receive more novel and diverse 
information (Granovetter 1973; Aral and Van Alstyne 2011; 


|Aral and David 2012| . We tested whether the same conclu 
sions hold for Twitter using topics learned by the proposed 
model. Figure [3] shows the relationship between friend topic 
diversity ( FTDi ) and structural network diversity (NDi) 
for the four classes of users divided according to their ef¬ 
fort. There is a strong correlation (0.9212 (p<.01)) for bot¬ 
tom quartile users (blue squares in Figure [3}, between net¬ 
work position and information diversity, correlation values 
decrease with increasing user effort (3rd quartile 0.9162 
(p<.01) and 2nd quartile 0.7774 (p<.01)). When these users 
place themselves in more structurally diverse position within 
the Twitter network, they receive on average more topically 
diverse tweets from friends than users who place themselves 
in less structurally diverse network positions. However, the 
correlation between FTD and ND for active users (red cir¬ 
cles in Figure [3J is far less, 0.3248 (p<.01). These users are 
generally exposed to more diverse information than the less 
active users, regardless of their network position. Also, ac¬ 
tive users in low network diversity positions receive more 
diverse information than the less active users in similar po¬ 
sitions. These results demonstrate that the effort users are 
willing to invest in using social media is an important factor 
in access to diverse information. 


Why are highly active users exposed to more diverse in¬ 
formation? To address this question, we study how network 
diversity changes as users add more friends. Figure [4] shows 
this relationship for users separated into two classes based 
on their activity or effort. Overall, network diversity in¬ 
creases with network size (after around 100 friends), which 
is not surprising since probabilistically as the number of peo¬ 
ple in a network grows, any two people are less likely to be 


0.15 


>. 

U) 


Q. 

O 


0.1 


0.05 


• 

- 

• 


■k 

* 


▲ 

▲ 


** ■ ■ 


■ 

• top quartile (5.3<0) 

■ 

* 2nd quartile (3.1<0<5.3) 

■ 

* 3rd quartile (1.9<0<3.1) 

■ bottom quartile (0<1.9) 


0.2 


0.4 0.6 

ND (Network Diversity) 


0.8 


Figure 3: Friend topic diversity (FTDi) of a user as a func¬ 
tion of the network diversity (NDi) in the 2014 Twitter data 
set. We show the average of FTDi for the same network di¬ 
versity (NDi) users with their standard deviation ranges in 
grey color. Users in the higher network diversity positions 
tend to be exposed to more diverse information, with ac¬ 
tive users receiving more diverse information regardless of 
their position in the network structure. We group ND values 
into equal-sized bins and compute the mean of both ND and 
FTD within each bin. 

connected to each other. Active users overall place them¬ 
selves in more structurally diverse positions. 

Surprisingly, network diversity initially decreases with 
network size for both user populations, reaching a minimum 
around S = 100. A potential explanation of this effect in¬ 
volves the Dunbar number. Dunbar ( jDunbar 1992[ ) argued 
that finite human cognitive capacity constrains the number 
of social interactions individuals can manage, limiting size 
of social groups to about 100-200 individual. Research has 
validated the impact of cognitive constraints on online so¬ 
cial interactions (Goncalves, Perra, and Vespignani 2011 
|Kang and Lerman 2013b| ).~Similar arguments could apply 
to our setting. Minimum network diversity corresponds to 
maximal social connectivity, which in our Twitter data set 
occurs when users have around 100 friends. While their so¬ 
cial networks can grow beyond that size, increasing network 
diversity implies that new friends are less likely to form a 
community. 

The minimum in network diversity for the less active 
users occurs at lower values than for the more active users. 
This suggests that active users who invest more effort into 
using Twitter can manage larger communities of connected 
friends than the less-active users. This observation is in line 
with cognitive limits on social interactions theory: users who 
have a greater capacity for social interactions (or who may 
simply be willing to invest more time and effort in social 
interactions) will have more interactions on Twitter (higher 
activity), and they will also tend to belong to larger social 
groups (higher network size), simply because they are better 
capable of managing their social connections. At this time 
we cannot prove this intriguing possibility, and leave it as a 
question for future research. 




























1 


„ 0.8 


> 0.6 
Q 

O 

I 0.4 


0.2 


?o“ 



• top 50% users 
8 bottom 50% users 


10 


10 10 
S (number of friends) 


10 


10 


Figure 4: Network diversity ( ND ) as a function of the num¬ 
ber of active friends ( S ) in the 2014 Twitter data set. We use 
equal-sized bins for each class. 
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Figure 5: Histograms of network diversity (ND) of users in 
the 2014 Twitter data set. Users are divided into two popula¬ 
tions based on their effort (O). The peak of top 50% users is 
higher than bottom 50% users, while bottom 50% users tend 
to have higher ND. 

Related Work 

A pair of classic theories has linked an individual’s position 
within a network to the novelty and diversity of information 
she receives through her social contacts. The theoretical ar¬ 
gument, known as “the strength of weak ties” (jGranovetter 
|1973[ ), explored the relationship between social links and the 
information people receive along those links. Specifically, 
the weak links, representing infrequent social interactions, 
were shown to deliver novel information to people, pro¬ 


viding new social and economic opportunities (Uzzi 1997 


Reagans and Zuckerman 2001; Reagans and McEvily 2003 
Allen 2003|l. 


Burt (B urt 1995} Burt 2004 Burt 2005] ) argued that weak 
ties act as bridges between different communities. Individ¬ 
uals with many such ties are in what he termed “broker¬ 
age positions” in the network, which allows them with ac¬ 
cess, and benefit from, novel information residing in di¬ 
verse sources. Empirical research on mobile phone (On- 


nela et al. 2007), email communication (Aral and Van Al- 

styne 20111 

Iribarren and Moro 2011), and online social 

networks fGrabowicz et al. 2011 

Centola and Macy 2007 

Centola 2010) supported the wea 

t ties arguments about the 


nature of interactions on a network and its structure. 

Aral & Van Alstyne show that both structurally diverse 
brokerage positions in the network and high frequency com¬ 
munication along social ties provided access to diverse and 
novel information in the email communication network. In 


social media, Kang & Lerman (Kang and Lerman 2013b) 


showed that increasing activity of social media friends a user 
follows affected how much novel information user received 
from them, while increasing network diversity provided ac¬ 
cess to more topically diverse information, but not the other 
around. Bakshy et al. ( Bakshy et al. 2012) showed that, al¬ 
though strong ties are individually more influential, weak 
ties increased the diversity of information received. 

Cognitive constraints on social interactions provide an in¬ 
teresting perspective on the structure and function of social 
networks. Dunbar argued that people have a limited ability, 
defined by their brain’s capacity, to manage social interac 
tions, which gives rise to maximum social group size (Dun 


|bar 2003) . Although social media was believed to expand 
the size of human social networks, research showed that the 
maximum number of friends that Twitter users interact with 
is around 100-200 ( jGoncalves, Perra, and Vespignani 201 1] ), 
similar to the Dunbar number. Cognitive constraints could 


also explain the findings of (Aral and Van Alstyne 2011 
|Aral and David 2072) , namely that cognitive constraints ere 
ate a trade-off between the complexity of social interactions 
(given by network diversity) and the intensity of interactions 
along structurally complex links, resulting in “diversity- 
bandwidth trade-off.” Unlike previous researchers, we ex¬ 
amined how users vary in their capacity for social interac¬ 
tions (or activity), and how this capacity defines their level 
of engagement with the social media site and access to di¬ 
verse information. 

Recommender system (Herlocker et al. 1999{ Sarwar et al. 


1 2001 |[Karypis 20 00) examines item ratings of many people 
to discover their preferences and recommend new items that 
were liked by similar people. Latent-factor models, such as 
probabilistic matrix factorization ([Salakhutdinov and Mnihj 


2008, Koren, Bell, and Volinsky 200 9', Wang and Biel 2011), 

have shown promising in creating better recommendations 
by incorporating personal relevance into the model. Many 
social recommender systems have been proposed by ma¬ 
trix factorization techniques for both user’s social network 
and their item rating histories fMa et al. 2008| ). In addition 
to modeling user-item adoptions, researchers integrate so¬ 
cial correlation between users ( |Purushotham, Liu, and Kuo| 
|2012| >, topic influences of friends (Kang and Lerman 201 3a| ), 
and cognitive biases ( Kang and Lerman 2015) 1 in social rec¬ 
ommender system. 

Recommender systems often focus on understanding user 
preferences based on the history of observed actions to rec¬ 
ommend possible future likes and interests. One of the key 
challenge is how to increase the variety of recommended 
items without the expenses of sacrificing the accuracy. The 
trade-off between exploration and exploitation is important 








































































































to prevent over-specialization where we never recommend 
items outside of the history of user’s actions. Most of the cur¬ 
rent approaches focus on proposing new intra-list diversity 
metrics (Ziegl er";t al. 2005] Agrawal et al. 2009) to diver¬ 
sify recommendations. Our study shows that users increase 
activity to access diverse information. We can estimate how 
much user opens to diverse information by taking into ac¬ 
count the engagement levels as well as the network diversity 
of the user. 


Conclusion 


The idea that network structure affects the novelty and di¬ 
versity of information people receive from their social con¬ 
tacts has long fascinated sociologists ( Granovetter 1973) 
Burt 1995). However, humans also have a finite cognitive 
capacity, which constraints how many social relations they 
are able to manage ( Dunbar 1992| ). The interplay between 
network structure and cognitive constraints has important 
implications for how people gain access to information in 
social networks in general, and on social media in partic¬ 
ular. In this paper, we explored these questions using data 
from a popular social media platform Twitter, where users 
create links in order to receive information, in the form of 
short text messages called tweets, from other people. 

One of the challenges we faced is measuring the diversity 
of information users receive from their friends on Twitter. 
We addressed this challenge by using a probabilistic model 
to learn users’ topics of interest from the messages they re¬ 
ceive and share on Twitter. Our model incorporates the text 
of messages and a user’s network in a generative model of 
information spread. We then used learned topics to measure 
diversity of the information a user is exposed to as the vari¬ 
ance of topic interests of the user’s friends. 

By quantifying information diversity, we can study the 
factors that affect information access in networks. We con¬ 
firmed that network position plays an important role: users 
can increase the amount of diverse information they receive 
by increasing the structural diversity of their network po¬ 
sition, rather than simply increasing the number of people 
they follow. However, we also identified user effort as an im¬ 
portant factor mediating access to information in networks. 
Users who post (and consume) more messages place them¬ 
selves in positions of higher network diversity than the less 
active users. Even when they are in structurally similar po¬ 
sitions, the more active users receive more diverse informa¬ 
tion. This suggests that users who invest greater effort into 
using Twitter may have higher cognitive capacity for pro¬ 
cessing information, or they may simply be able to devote 
more time to such interactions fMiritello et al. 2013bj ). These 
users curate their links so as to increase the diversity of in¬ 
formation they receive. One mechanism for accomplishing 
this is to break links so as to reduce the redundancy of re¬ 
ceived information. Even when these actions do not change 
a user’s structural position within the network, they serve 
to increase information diversity. Our work underscores the 
importance of cognitive factors and variation in effort in ac¬ 
cess to information in networks. Work is needed to further 
disentangle these factors. 
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